1,192 research outputs found
Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition
The success of self-attention in NLP has led to recent applications in
end-to-end encoder-decoder architectures for speech recognition. Separately,
connectionist temporal classification (CTC) has matured as an alignment-free,
non-autoregressive approach to sequence transduction, either by itself or in
various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully
self-attentional network for CTC, and show it is tractable and competitive for
end-to-end speech recognition. SAN-CTC trains quickly and outperforms existing
CTC models and most encoder-decoder models, with character error rates (CERs)
of 4.7% in 1 day on WSJ eval92 and 2.8% in 1 week on LibriSpeech test-clean,
with a fixed architecture and one GPU. Similar improvements hold for WERs after
LM decoding. We motivate the architecture for speech, evaluate position and
downsampling approaches, and explore how label alphabets (character, phoneme,
subword) affect attention heads and performance.Comment: Accepted to ICASSP 201
Conway's subprime Fibonacci sequences
It's the age-old recurrence with a twist: sum the last two terms and if the
result is composite, divide by its smallest prime divisor to get the next term
(e.g., 0, 1, 1, 2, 3, 5, 4, 3, 7, ...). These sequences exhibit pseudo-random
behaviour and generally terminate in a handful of cycles, properties
reminiscent of 3x+1 and related sequences. We examine the elementary properties
of these 'subprime' Fibonacci sequences.Comment: 18 pages, 5 figure
Deep Contextualized Acoustic Representations For Semi-Supervised Speech Recognition
We propose a novel approach to semi-supervised automatic speech recognition
(ASR). We first exploit a large amount of unlabeled audio data via
representation learning, where we reconstruct a temporal slice of filterbank
features from past and future context frames. The resulting deep contextualized
acoustic representations (DeCoAR) are then used to train a CTC-based end-to-end
ASR system using a smaller amount of labeled audio data. In our experiments, we
show that systems trained on DeCoAR consistently outperform ones trained on
conventional filterbank features, giving 42% and 19% relative improvement over
the baseline on WSJ eval92 and LibriSpeech test-clean, respectively. Our
approach can drastically reduce the amount of labeled data required;
unsupervised training on LibriSpeech then supervision with 100 hours of labeled
data achieves performance on par with training on all 960 hours directly.
Pre-trained models and code will be released online.Comment: Accepted to ICASSP 2020 (oral
Masked Language Model Scoring
Pretrained masked language models (MLMs) require finetuning for most NLP
tasks. Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood
scores (PLLs), which are computed by masking tokens one by one. We show that
PLLs outperform scores from autoregressive language models like GPT-2 in a
variety of tasks. By rescoring ASR and NMT hypotheses, RoBERTa reduces an
end-to-end LibriSpeech model's WER by 30% relative and adds up to +1.7 BLEU on
state-of-the-art baselines for low-resource translation pairs, with further
gains from domain adaptation. We attribute this success to PLL's unsupervised
expression of linguistic acceptability without a left-to-right bias, greatly
improving on scores from GPT-2 (+10 points on island effects, NPI licensing in
BLiMP). One can finetune MLMs to give scores without masking, enabling
computation in a single inference pass. In all, PLLs and their associated
pseudo-perplexities (PPPLs) enable plug-and-play use of the growing number of
pretrained MLMs; e.g., we use a single cross-lingual model to rescore
translations in multiple languages. We release our library for language model
scoring at https://github.com/awslabs/mlm-scoring.Comment: ACL 2020 camera-ready (presented July 2020
La administración pública en los estados y reflexiones sobre el federalismo
El federalismo en México, desde su concepción orig i nal en la Constitución de 1824, ha venido transformándose y adecuándose a las nuevas exigencias de estados y municipios para pasar de una centralización a una descentralización, con la finalidad de transferir más facultades y atribuciones del gobierno cen tral hacia otros órdenes gubernamentales. A la par, han surgido organizaciones como la Conferencia Nacional de Gobernadores (Conago) para el caso de los estados, y asociaciones como la Conferencia Nacional de Municipios de México (Conamm) para los municipios, como un contrapeso real para el fortalecimiento del federalismo. A continuación analizaremos algunos aspectos de la descentralización que están contribuyendo a dicho fortalecimiento en México
Contextual Phonetic Pretraining for End-to-end Utterance-level Language and Speaker Recognition
Pretrained contextual word representations in NLP have greatly improved
performance on various downstream tasks. For speech, we propose contextual
frame representations that capture phonetic information at the acoustic frame
level and can be used for utterance-level language, speaker, and speech
recognition. These representations come from the frame-wise intermediate
representations of an end-to-end, self-attentive ASR model (SAN-CTC) on spoken
utterances. We first train the model on the Fisher English corpus with
context-independent phoneme labels, then use its representations at inference
time as features for task-specific models on the NIST LRE07 closed-set language
recognition task and a Fisher speaker recognition task, giving significant
improvements over the state-of-the-art on both (e.g., language EER of 4.68% on
3sec utterances, 23% relative reduction in speaker EER). Results remain
competitive when using a novel dilated convolutional model for language
recognition, or when ASR pretraining is done with character labels only.Comment: submitted to INTERSPEECH 201
ESTÉTICAS DEL ENTRETENIMIENTO
Las estéticas delentretenimientoinciden en laeducación literaria, ya que adiario nos vemos seducidos porl o s d i s t i n t o s m e d i o s d ecomunicación, que a su vezi n v a d e n d e c o n s t a n t einformación nuestras mentes,saciando en nosotros mismosuna sed de visualizar nuestrasfantasías y a su vez de emplearnuestro tiempo en algo que nonos va aportar mucho a nuestro intelecto.Debido a este ocio podemos entender queesta forma mediática en nuestra sociedad yen el mundo es más común de los quepensamos, según nos cuenta el señor OmarRincón “la sociedad del entretenimiento sehizo como una forma de vida fue en losestados unidos; la música, el cine, laspelículas y algunas obras literarias las cualesfueron conceptualizadas”
Implementación del monitoreo de aspectos ambientales para la calidad ambiental en ejecución drenaje pluvial en el distrito de Carmen Alto
En la presente investigación se realizaron antes, durante y después de la ejecución
del proyecto “MEJORAMIENTO Y CREACION DEL SISTEMA DE DRENAJE
PLUVIAL DE LA AV. CARMEN ALTO, AV. PERU Y JR. CANGALLO, DISTRITO DE
CARMEN ALTO – HUAMANGA- AYACUCHO” la toma de muestra de monitoreo
ambientales de aire y ruido, de esa manera, podemos ver cómo el proyecto de
inversión pública ha afectado la calidad del aire y el ruido de fondo.
La metodología que se planteado para obtener los resultados son los protocolos
nacionales de calidad ambiental en muestreo de país. En donde los resultados
obtenidos de los monitoreos serán evaluados según las ECAs para aire y para aire.
El monitoreo ambiental realizado permitió identificar los impactos generados por el
proceso de ejecución del proyecto y así poder plantear medida de control y
mitigación frente a los impactos ambientales identificados.
En la investigación podremos ver los resultados y teniendo una mejora continua
sobre la gestión que se estaba realizando en el proyecto y así planteado medidas
para controlar los resultados altos y controlar dichos impacto
- …